Preliminary

Part 2

This document shows an exemplary downstream analysis on immune gene expression using the SingleCellAlleleExperiment (SCAE) package. The used dataset is under controlled access and not for public usage. The here shown results are part of the masters thesis:

Development of a multi-layer R object to study immune gene expression in spatial (and single-cell) transcriptomics at allele, gene and functional level written by Jonas Schuck.

The project was conducted in the research group ‘Computational Immunology’ lead by Dr. Katharina Imkeller at the Edinger Institute at the University Hospital Frankfurt. Dr. Imkeller provided supervision of the project. Primary corrector is Prof. Dr. Florian Buettner at Goethe University Frankfurt.

This document gives only a brief introduction to the SingleCellAlleleExperiment (SCAE) package as well as an overview for the performed steps described in the package application section of the thesis. For further information conduct the thesis.

Introduction

This is the second part of the exemplary workflow. Continuing with loading the object with the saved results from part 1. Here, an specific region is chosen from the t-SNE plots of the HLA-C immune gene, where a difference in the expression is observed between the two HLA-C alleles C*04:01:01:01 and C*06:02:01:01. Then, differential expression analysis is performed on the subsetted cluster to determine marker genes. This allows to find not only regions that show functional differences, observed by gene expression, but also subtypes of cells that show genetical differences, shown by differences in expression on the allele level.

Loading packages

The following package are abundant for performing the downstream analysis.

library(SingleCellAlleleExperiment)
library(scran)
library(scater)
library(gridExtra)
library(tidyverse)
library(patchwork)
library(cowplot)
library(ggplot2)
library(bluster)

Load data from part 1 of the analysis

redim_scae <- readRDS(file = "~/Work/R/Masterthesis/dev/SCAE/example_workflow_t_110_pca_tsne_perplex.rds")
redim_scae
## class: SingleCellAlleleExperiment 
## dim: 29923 47788 
## metadata(0):
## assays(2): counts logcounts
## rownames(29923): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
##   HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
##   AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
##   TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(6): PCA_a PCA_g ... TSNE_g_50 TSNE_f_50
## mainExpName: NULL
## altExpNames(0):
reducedDimNames(redim_scae)
## [1] "PCA_a"     "PCA_g"     "PCA_f"     "TSNE_a_50" "TSNE_g_50" "TSNE_f_50"

Clustering

Perform nearest-neighbour clustering on TSNE_g_50. Here, the clusterCells() function from the bluster package.

marker_scae <- redim_scae

nn.clusters <- clusterCells(marker_scae, use.dimred="TSNE_g_50", BLUSPARAM=NNGraphParam(k=50))
colLabels(marker_scae) <- nn.clusters

#renaming the label columns
colnames(colData(marker_scae))
## [1] "Sample"     "Barcode"    "sizeFactor" "label"

Changing the labels column in case multiple cluster-runs are computed.

colnames(colData(marker_scae))[4] <- "label_k_50"
colnames(colData(marker_scae))
## [1] "Sample"     "Barcode"    "sizeFactor" "label_k_50"

Plotting the cluster results on the TSNE_g_50.

plotReducedDim(marker_scae, "TSNE_g_50", colour_by="label_k_50") + theme_bw()


Plot again the HLA-C immune gene with the corresponding alleles to see which sublcuster contains the in-cluster differences between the two alleles. The differences appear to be for example in cluster label 17.

which_tsne <- "TSNE_g_50"

#EGA_hla_c_alleles
tsne_g_c  <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-C")
tsne_g_c1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*04:01:01:01")
tsne_g_c2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*06:02:01:01")

p1 <- tsne_g_c + tsne_g_c1 + tsne_g_c2
p1


Determine subcluster

Subset the cluster labels of the cluster containing in-cluster differences in the HLA-C plot.

cl_subset <- marker_scae[, colData(marker_scae)$label_k_50 %in% c(1,2,9,27,27,17,20)]

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "label_k_50") + theme_bw()

Plot again the HLA-C allele expression plots on the subcluster to have a zoomed in view on the chosen subcluster.

hclusc1 <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "C*04:01:01:01")
hclusc2 <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "C*06:02:01:01")

pclus <- hclusc1 + hclusc2
pclus

Find upregulated markergenes

To determine marker genes for each cluster region, the findMarkers() function from scran is used.

colData(cl_subset)$label_k_50 <- droplevels(colData(cl_subset)$label_k_50)

#markergenes for the subcluster
hclus_markers <- findMarkers(cl_subset, test = "binom", direction = "up", groups = colData(cl_subset)$label_k_50)
hclus_markers
## List of length 6
## names(6): 1 2 9 17 20 27

Return the list of determined top marker genes for each subcluster.

#---------------subcluster1---------------#
interesting_c1 <- hclus_markers[[1]]

top_list_c1 <- interesting_c1[interesting_c1$Top <=5,]
ens_only_c1 <- rownames(top_list_c1[startsWith(rownames(top_list_c1), "ENS"),])

plotExpression(cl_subset, x = "label_k_50", features = ens_only_c1) + 
  labs(title = "Upregulated markers subcluster 1")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c1[1])  + 
  labs(title = "Top marker subcluster 1") + theme_bw()

#---------------subcluster2---------------#
interesting_c2 <- hclus_markers[[2]]

top_list_c2 <- interesting_c2[interesting_c2$Top <=5,]
ens_only_c2 <- rownames(top_list_c2[startsWith(rownames(top_list_c2), "ENS"),])

plotExpression(cl_subset, x = "label_k_50", features = ens_only_c2) + 
  labs(title = "Upregulated markers subcluster 2")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c2[1])  + 
  labs(title = "Top marker subcluster 2") + theme_bw()

#---------------subcluster9-----------#
interesting_c3 <- hclus_markers[[3]]

top_list_c3 <- interesting_c3[interesting_c3$Top <=3,]
ens_only_c3 <- rownames(top_list_c3[startsWith(rownames(top_list_c3), "ENS"),])

plotExpression(cl_subset, x = "label_k_50", features = ens_only_c3) + 
  labs(title = "Upregulated markers subcluster 9")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c3[1])  + 
  labs(title = "Top marker subcluster 9") + theme_bw()

#---------------subcluster17--!!!-------------#
interesting_c4 <- hclus_markers[[4]]

top_list_c4 <- interesting_c4[interesting_c4$Top <=5,]
ens_only_c4 <- rownames(top_list_c4[startsWith(rownames(top_list_c4), "ENS"),])

plotExpression(cl_subset, x = "label_k_50", features = ens_only_c4) +
  labs(title = "Upregulated markers subcluster 17")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c4[2])  +
  labs(title = "Top marker subcluster 17") + theme_bw()

#---------------subcluster20---------------#
interesting_c20 <- hclus_markers[[5]]

top_list_c20 <- interesting_c20[interesting_c20$Top <=5,]
ens_only_c20 <- rownames(top_list_c20[startsWith(rownames(top_list_c20), "ENS"),])

plotExpression(cl_subset, x = "label_k_50", features = ens_only_c20) +
  labs(title = "Upregulated markers subcluster 20")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c20[1])  +
  labs(title = "Top marker subcluster 20") + theme_bw()

#---------------subcluster27---------------#
interesting_c21 <- hclus_markers[[6]]

top_list_c21 <- interesting_c21[interesting_c21$Top <=5,]
ens_only_c21 <- rownames(top_list_c21[startsWith(rownames(top_list_c21), "ENS"),])
#ens_only_c21
plotExpression(cl_subset, x = "label_k_50", features = ens_only_c21) +
  labs(title = "Upregulated markers subcluster 27")

plotReducedDim(cl_subset, "TSNE_g_50", colour_by = ens_only_c21[1])  +
  labs(title = "Top marker subcluster 27") + theme_bw()

Expression plots for subclusters across both alleles

Overview plot for the marker gene for subcluster label 17.

hclus <- plotReducedDim(cl_subset, "TSNE_g_50", colour_by = "label_k_50") + theme_bw()
c1    <- plotExpression(cl_subset, x = "label_k_50", features = "C*04:01:01:01")
c2    <- plotExpression(cl_subset, x = "label_k_50", features = "C*06:02:01:01")
bach2 <- plotExpression(cl_subset, x = "label_k_50", features="ENSG00000112182.16")

unified_exp <- hclus + theme_bw() +  labs(title = "A") + xlab("TSNE_g") + ylab("TSNE_g") + 
                       guides(color=guide_legend("label"))  + theme(title = element_text(size = 12)) +
                  c1 + theme_bw() +  labs(title = "B") + xlab("label")  + theme(title = element_text(size = 12)) +
               bach2 + theme_bw() +  labs(title = "C") + xlab("label")  + theme(title = element_text(size = 12)) +
                  c2 + theme_bw() +  xlab("label")  + theme(title = element_text(size = 12))
unified_exp

Session Information

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] bluster_1.4.0                         cowplot_1.1.1                        
##  [3] patchwork_1.1.3                       lubridate_1.9.2                      
##  [5] forcats_1.0.0                         stringr_1.5.0                        
##  [7] dplyr_1.1.2                           purrr_1.0.2                          
##  [9] readr_2.1.4                           tidyr_1.3.0                          
## [11] tibble_3.2.1                          tidyverse_2.0.0                      
## [13] gridExtra_2.3                         scater_1.22.0                        
## [15] ggplot2_3.4.3                         scran_1.22.1                         
## [17] scuttle_1.4.0                         SingleCellAlleleExperiment_0.0.0.9000
## [19] SingleCellExperiment_1.16.0           SummarizedExperiment_1.24.0          
## [21] Biobase_2.54.0                        GenomicRanges_1.46.1                 
## [23] GenomeInfoDb_1.30.1                   IRanges_2.28.0                       
## [25] S4Vectors_0.32.4                      BiocGenerics_0.40.0                  
## [27] MatrixGenerics_1.6.0                  matrixStats_1.0.0                    
## [29] BiocStyle_2.22.0                     
## 
## loaded via a namespace (and not attached):
##  [1] ggbeeswarm_0.7.2          colorspace_2.1-0         
##  [3] XVector_0.34.0            BiocNeighbors_1.12.0     
##  [5] rstudioapi_0.15.0         farver_2.1.1             
##  [7] ggrepel_0.9.3             bit64_4.0.5              
##  [9] AnnotationDbi_1.56.2      fansi_1.0.4              
## [11] xml2_1.3.5                sparseMatrixStats_1.6.0  
## [13] cachem_1.0.8              knitr_1.43               
## [15] jsonlite_1.8.7            cluster_2.1.4            
## [17] dbplyr_2.3.3              png_0.1-8                
## [19] BiocManager_1.30.22       compiler_4.1.2           
## [21] httr_1.4.6                dqrng_0.3.0              
## [23] Matrix_1.6-1              fastmap_1.1.1            
## [25] limma_3.50.3              cli_3.6.1                
## [27] BiocSingular_1.10.0       htmltools_0.5.6          
## [29] prettyunits_1.1.1         tools_4.1.2              
## [31] rsvd_1.0.5                igraph_1.5.1             
## [33] gtable_0.3.3              glue_1.6.2               
## [35] GenomeInfoDbData_1.2.7    rappdirs_0.3.3           
## [37] Rcpp_1.0.11               jquerylib_0.1.4          
## [39] vctrs_0.6.3               Biostrings_2.62.0        
## [41] DelayedMatrixStats_1.16.0 xfun_0.40                
## [43] beachmat_2.10.0           timechange_0.2.0         
## [45] lifecycle_1.0.3           irlba_2.3.5.1            
## [47] statmod_1.5.0             XML_3.99-0.14            
## [49] org.Hs.eg.db_3.14.0       edgeR_3.36.0             
## [51] zlibbioc_1.40.0           scales_1.2.1             
## [53] hms_1.1.3                 parallel_4.1.2           
## [55] yaml_2.3.7                curl_5.0.2               
## [57] memoise_2.0.1             sass_0.4.7               
## [59] biomaRt_2.50.3            stringi_1.7.12           
## [61] RSQLite_2.3.1             highr_0.10               
## [63] ScaledMatrix_1.2.0        filelock_1.0.2           
## [65] BiocParallel_1.28.3       rlang_1.1.1              
## [67] pkgconfig_2.0.3           bitops_1.0-7             
## [69] evaluate_0.21             lattice_0.21-8           
## [71] labeling_0.4.2            bit_4.0.5                
## [73] tidyselect_1.2.0          magrittr_2.0.3           
## [75] R6_2.5.1                  generics_0.1.3           
## [77] metapod_1.2.0             DelayedArray_0.20.0      
## [79] DBI_1.1.3                 pillar_1.9.0             
## [81] withr_2.5.0               KEGGREST_1.34.0          
## [83] RCurl_1.98-1.12           crayon_1.5.2             
## [85] utf8_1.2.3                BiocFileCache_2.2.1      
## [87] tzdb_0.4.0                rmarkdown_2.24           
## [89] viridis_0.6.4             progress_1.2.2           
## [91] locfit_1.5-9.8            grid_4.1.2               
## [93] blob_1.2.4                digest_0.6.33            
## [95] munsell_0.5.0             beeswarm_0.4.0           
## [97] viridisLite_0.4.2         vipor_0.4.5              
## [99] bslib_0.5.1